feat: Complete Greenhouse API implementation with CLI tools#25
Open
rothnic wants to merge 4 commits intoPickle-Pixel:mainfrom
Open
feat: Complete Greenhouse API implementation with CLI tools#25rothnic wants to merge 4 commits intoPickle-Pixel:mainfrom
rothnic wants to merge 4 commits intoPickle-Pixel:mainfrom
Conversation
Add comprehensive Greenhouse ATS scraping to capture jobs from AI/ML startups: New Module: src/applypilot/discovery/greenhouse.py - HTML scraping with BeautifulSoup for job-boards.greenhouse.io - Parallel execution with ThreadPoolExecutor - Location filtering (remote detection, accept/reject patterns) - Query matching for job title filtering - Duplicate prevention via URL-based deduplication New Config: src/applypilot/config/greenhouse.yaml - 129 verified Greenhouse employers - Organized by category: Core AI, Infrastructure, Fintech, Healthcare, etc. - Companies: Scale AI, Stripe, Figma, Notion, MongoDB, Datadog, etc. Pipeline Integration: - Wired into _run_discover() alongside JobSpy, Workday, SmartExtract - Stats tracking for new/existing jobs - Error handling with graceful degradation Testing: - Comprehensive unit tests in tests/discovery/test_greenhouse.py - Verified with Scale AI: found 32 jobs including ML Engineer roles - All 129 employers load successfully - Parallel search tested with 4 workers Closes: Option A for expanding AI company coverage
Replace HTML scraping with Greenhouse Job Board API:
- Use boards-api.greenhouse.io/v1/boards/{token}/jobs endpoint
- Add full job descriptions from API (content=true parameter)
- Add new fields: job_id, updated_at, offices, description
- Add retry logic for rate limits (HTTP 429)
- Add user config override (~/.applypilot/greenhouse.yaml)
- Remove BeautifulSoup dependency for this module
- Update all tests for API-based implementation
- 25 tests passing
Integrate greenhouse CLI into main applypilot CLI: - verify: Check if company slug is valid - discover: Find slugs from company name or career URL - validate: Check all companies in greenhouse.yaml - list-employers: Display configured employers - add-job: Add specific job from URL with structured display Commands available via: applypilot greenhouse <command>
- Update CHANGELOG: reflect API-based approach (not HTML scraping) - Document new CLI commands and user config override - Update README: add 129 Greenhouse employers to supported sources - Fix notion → notionhq slug in greenhouse.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements complete Greenhouse ATS support using the official Greenhouse Job Board API, adding 129 pre-configured AI/ML employers and comprehensive CLI management tools.
What's Included
Core Greenhouse Integration
boards-api.greenhouse.io/v1/boards/{token}/jobsendpoint for reliable, structured job data?content=trueparameter~/.applypilot/greenhouse.yamlfor custom employer listsCLI Management Tools (5 Commands)
Pre-configured Employers
Test Coverage
✅ 25 tests passing
Verified Working
Files Added/Modified
New Dependencies
None. Uses existing project dependencies (httpx, pyyaml, typer).
Breaking Changes
None. This is a pure addition - all existing functionality remains unchanged.
Usage Example
Notes